Day19 Dion train(中)

第 11 屆 iThome 鐵人賽

DAY 19

AI & Data

人工智慧(RL系列) 完爆遊戲30天系列第 19 篇

11th鐵人賽

皮卡喵

2019-10-04 22:02:42

1475 瀏覽

分享至

上個章節介紹完前置設定跟資料備份，這張我們來講replay memory跟探索值epsilon！

觀察與訓練模式

if observe : # 是否進入觀察狀態，如果有那接下來模型都不會訓練，觀察已滿意的模型用
    OBSERVE = 999999999 #
    epsilon = FINAL_EPSILON
    print ("Now we load weight")
    model.load_weights("model.h5")
    adam = Adam(lr=LEARNING_RATE)
    model.compile(loss='mse',optimizer=adam)
else: #We go to training mode
    OBSERVE = OBSERVATION
    epsilon = load_obj("epsilon")
    model.load_weights("model.h5")
    adam = Adam(lr=LEARNING_RATE)
    model.compile(loss='mse',optimizer=adam)
t = load_obj("time") # 載入已訓練的回合次數，銜接過去強制結束前的訓練紀錄。

訓練變數初始化

宣告區域變數

while (True): 
    loss = 0 # loss值
    Q_sa = 0 # 後面要用的Q現實
    action_index = 0 # 執行的動作
    r_t = 0 # 獎勵值
    a_t = np.zeros([ACTIONS]) # 動作的array

執行動作

這邊FRAME_PER_ACTION值設1，無論如何都會執行除非你希望1禎以上做判斷。第二層判斷為是否隨機探索。

if t % FRAME_PER_ACTION == 0: 
    if random.random() <= epsilon: #探索值
        print("----------Random Action----------")
        action_index = random.randrange(ACTIONS)
        a_t[action_index] = 1
    else: # 模型output
        q = model.predict(s_t)
        max_Q = np.argmax(q) # 兩個動作選最大的值並取出index
        action_index = max_Q
        a_t[action_index] = 1

探索值遞減

讓epsilon不斷遞減直到final_epsilon。

if epsilon > FINAL_EPSILON and t > OBSERVE:
    epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE

環境互動

x_t1, r_t, terminal = game_state.get_state(a_t) # 輸入動作給環境
print('fps: {0}'.format(1 / (time.time()-last_time))) # 計算時間點
last_time = time.time()  # 重置時間點
x_t1 = x_t1.reshape(1, x_t1.shape[0], x_t1.shape[1], 1) # 1x80x80x1
s_t1 = np.append(x_t1, s_t[:, :, :, :3], axis=3) # 新的圖像更新至最前面，原本第四個捨棄

資料倉儲

這邊就是控制replay memory的數量，有超過就移除最左邊的資料。

D.append((s_t, action_index, r_t, s_t1, terminal))
if len(D) > REPLAY_MEMORY:
    D.popleft()

程式碼實作

訓練小恐龍主程序unit5_dino

結語

好哩環境跟資料倉儲差不多了，明天開始分享核心DQN訓練囉！

Day18 Dion train(上)

Day20 Dino train(下)

系列文

人工智慧(RL系列) 完爆遊戲30天共 30 篇

RSS系列文訂閱系列文

13 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19777 篇

完賽人數

529 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

人工智慧(RL系列) 完爆遊戲30天系列 第 19 篇